Search CORE

19 research outputs found

Effects of Polarization on Particle-Laden Flows

Author: Gu Yile
Kolehmainen Jari
Ozel Ali
Shinbrot Troy
Sundaresan Sankaran
Publication venue: 'American Physical Society (APS)'
Publication date: 21/09/2018
Field of study

Scaling Laws for Discriminative Speech Recognition Rescoring Models

Author: Bulyko Ivan
Gandhe Ankur
Gu Yile
Kolehmainen Jari
Rastrow Ariya
Shivakumar Prashanth Gurunath
Publication venue
Publication date: 27/06/2023
Field of study

Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We study whether this scaling property is also applicable to second-pass rescoring, which is an important component of speech recognition systems. We focus on RescoreBERT as the rescoring model, which uses a pre-trained Transformer-based architecture fined tuned with an ASR discriminative loss. Using such a rescoring model, we show that the word error rate (WER) follows a scaling law for over two orders of magnitude as training data and model size increase. In addition, it is found that a pre-trained model would require less data than a randomly initialized model of the same size, representing effective data transferred from pre-training step. This effective data transferred is found to also follow a scaling law with the data and model size

arXiv.org e-Print Archive

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

Author: Bulyko Ivan
Gandhe Ankur
Gu Yile
Kolehmainen Jari
Rastrow Ariya
Shivakumar Prashanth Gurunath
Publication venue
Publication date: 09/10/2023
Field of study

Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative training, directly optimizing the minimum word-error-rate (MWER) criterion typically improves rescoring. In this study, we propose and explore several discriminative fine-tuning schemes for pre-trained LMs. We propose two architectures based on different pooling strategies of output embeddings and compare with probability based MWER. We conduct detailed comparisons between pre-trained causal and bidirectional LMs in discriminative settings. Experiments on LibriSpeech demonstrate that all MWER training schemes are beneficial, giving additional gains upto 8.5\% WER. Proposed pooling variants achieve lower latency while retaining most improvements. Finally, our study concludes that bidirectionality is better utilized with discriminative training.Comment: ASRU 202

arXiv.org e-Print Archive

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Author: Bulyko Ivan
Ghosh Shalini
Gu Yile
Liu Yi-Chieh
Stolcke Andreas
Yang Chao-Han Huck
Publication venue
Publication date: 10/10/2023
Field of study

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.Comment: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2023. 8 pages. 2nd version revised from Sep 29th's versio

arXiv.org e-Print Archive

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Author: Bulyko Ivan
Gandhe Ankur
Gourav Aditya
Gu Yile
Kolehmainen Jari
Rastrow Ariya
Shivakumar Prashanth Gurunath
Publication venue
Publication date: 13/07/2023
Field of study

Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%

arXiv.org e-Print Archive

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Author: Bulyko Ivan
Chen I-Fan
Dinh Tuan
Filimonov Denis
Gandhe Ankur
Ghosh Shalini
Gourav Aditya
Gu Yile
Kolehmainen Jari
Liu Yi-Chieh
Luo Qi
Rastow Ariya
Ren Roger
Ryu Sungho
Shivakumar Prashanth G.
Stolcke Andreas
Yang Chao-Han Huck
Yu Yu
Publication venue
Publication date: 10/10/2023
Field of study

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.Comment: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 page

arXiv.org e-Print Archive

Analysis of gas-particle flows through multi-scale simulations

Author: Gu Yile
Publication venue: Princeton, NJ : Princeton University
Publication date: 01/01/2018
Field of study

Multi-scale structures are inherent in gas-solid flows, which render the modeling efforts challenging. On one hand, detailed simulations where the fine structures are resolved and particle properties can be directly specified can account for complex flow behaviors, but they are too computationally expensive to apply for larger systems. On the other hand, coarse-grained simulations demand much less computations but they necessitate constitutive models which are often not readily available for given particle properties. The present study focuses on addressing this issue, as it seeks to provide a general framework through which one can obtain the required constitutive models from detailed simulations. To demonstrate the viability of this general framework in which closures can be proposed for different particle properties, we focus on the van der Waals force of interaction between particles. We start with Computational Fluid Dynamics (CFD) - Discrete Element Method (DEM) simulations where the fine structures are resolved and van der Waals force between particles can be directly specified, and obtain closures for stress and drag that are required for coarse-grained simulations. Specifically, we develop a new cohesion model that appropriately accounts for van der Waals force between particles to be used for CFD-DEM simulations. We then validate this cohesion model and the CFD-DEM approach by showing that it can qualitatively capture experimental results where the addition of small particles to gas fluidization reduces bubble sizes. Based on the DEM and CFD-DEM simulation results, we propose stress models that account for the van der Waals force between particles. Finally, we apply machine learning, specifically neural networks, to obtain a drag model that captures the effects from fine structures and inter-particle cohesion. We show that this novel approach using neural networks, which can be readily applied for other closures other than drag here, can take advantage of the large amount of data generated from simulations, and therefore offer superior modeling performance over traditional approaches

Dataspace

A modified cohesion model for CFD-DEM simulations of fluidization

Author: Gu Yile
Ozel Ali
Sundaresan Sankaran
Publication venue: 'Elsevier BV'
Publication date: 01/08/2016
Field of study

Heriot Watt Pure